Microarchitectural Miss/Execute Decoupling
نویسندگان
چکیده
The decoupled access/execute architecture described a machine that enables the access of memory values to be decoupled from the consumption of those values. Although never widely adopted in its original form, the decoupled design is a compelling way to tolerate memory latency. In this paper, we propose and demonstrate a novel implementation of decoupling, one based on the following two refinements of the original idea. First, because the latency of cache hits can generally be tolerated, we only decouple from the main program accesses that are likely to miss in the cache. Second, our decoupling takes place at the microarchitectural level, not the architectural level. By treating the access stream as a speculative thread and not allowing it to modify the architectural state of the machine, we relax the correctness constraints that were placed on it in the original design. For many programs, this added flexibility enables a level of decoupling and, consequently, latency tolerance that could not be achieved under the more constrained architectural model.
منابع مشابه
The Synergy of Multithreading and Access/Execute Decoupling
This work presents and evaluates a novel processor microarchitecture which combines two paradigms: access/ execute decoupling and simultaneous multithreading. We investigate how both techniques complement each other: while decoupling features an excellent memory latency hiding efficiency, multithreading supplies the in-order issue stage with enough ILP to hide the functional unit latencies. Its...
متن کاملImproving Latency Tolerance of Multithreading through Decoupling
ÐThe increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, are therefore directly concerned by this problem. This work presents and evaluates a novel processor microarchitecture which combines two paradi...
متن کاملMultipldpass Pipelining: Enhancing In-order Microarchitectures to Out-of-order Performance
Out-of-program-order execution has become almost a ubiquitous characteristic of modern processors because of its ability to tolerate variable memory-instruction latency. As designs are becoming increasingly power-conscious, the cost and complexity of the components of out-of-order execution are becoming problematic. Compilers have generally proven adept at planning useful static instruction-lev...
متن کاملMultithreaded Decoupled Access/Execute Processors
This work presents and evaluates a novel processor microarchitecture which combines two paradigms: access/execute decoupling and simultaneous multithreading. We investigate how both techniques complement each other in the design of high performance next generation ILP processors. While decoupling features an excellent memory latency hiding efficiency, simultaneous multithreading supplies the in...
متن کاملPerformance and power effectiveness in embedded processors customizable partitioned caches
This paper explores an application-specific customization technique for the data cache, one of the foremost area/power consuming and performance determining microarchitectural features of modern embedded processors. The automated methodology for customizing the processor microarchitecture that we propose results in increased performance, reduced power consumption and improved determinism of cri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000